Data Loading
Using PowerQuery, I imported the data from a CSV
file. After analyzing the column descriptions, I confirmed that none of
them contained errors or missing values. However, I quickly noticed that
the dataset includes missing days — some dates are
absent from the data. Interestingly, in both datasets
(alr_d, ing_d), the missing days are exactly
the same.
Alior Bank company data
ING Bank company data
Missing Data Analysis
I checked what percentage of the entire dataset is made up of missing
days. If it were less than 5%, they could have been omitted. However, in
our case, the missing data accounts for as much as 32%,
so ignoring them could distort the entire distribution analysis.
Generated set of all dates in the analyzed
period
Final dataset in Excel
Data Preparation
The table with the required data was saved to a CSV file. Then, in
VSCode, I replaced commas , with dots
. (06_change_comma_to_dot), because Excel in
my region saves numbers with a comma as the decimal separator.
Separator replacement
Linear Interpolation of Missing Values
The next step is to perform linear interpolation of
the missing data.
Definition of Linear Interpolation
Linear interpolation is a method to estimate missing values by
assuming a linear change between two known points.
Mathematically, if a value \(y\) at
position \(x\) is missing between two
known points \((x_0, y_0)\) and \((x_1, y_1)\), it is computed as:
\[
y = y_0 + \frac{y_1 - y_0}{x_1 - x_0} \cdot (x - x_0)
\]
Example
Consider the dataset:
\[
x = (3, 7, \text{NA}, 5, 8, \text{NA}, \text{NA}, 11)
\]
- The first missing value (
NA) is between 7 (position 2)
and 5 (position 4). Using the formula:
\[
y = 7 + \frac{5 - 7}{4 - 2} \cdot (3 - 2) = 7 + (-1) = 6
\]
- The second missing value (
NA) is between 8 (position 5)
and 11 (position 8). Since there are two consecutive
NAs, we interpolate them linearly:
\[
y = 8 + \frac{11 - 8}{8 - 5} \cdot (6 - 5) = 8 + 1 = 9
\]
\[
y = 8 + \frac{11 - 8}{8 - 5} \cdot (7 - 5) = 8 + 2 = 10
\]
After linear interpolation, the completed dataset becomes:
\[
x = (3, 7, 6, 5, 8, 9, 10, 11)
\]
Loading Data
I load the data into R and split it into the appropriate columns:
data <- read.csv(
file = "data.csv",
header = TRUE,
sep = ";"
)
data
data_dates <- data$date
data_alior <- data$close_alr
data_ing <- data$close_ing
Using Build-In Functions
I use the approx function in R for linear
interpolation.
The approx function works as follows:
It takes a vector of known points:
x: representing indexes of non-empty values,
y: representing values at indexes at
x.
It computes values at target points (xout) through
linear interpolation, i.e., it determines a point on
the line connecting two neighboring known points.
The parameter rule = 2 ensures that values outside
the range of known points are extrapolated instead of
returning NA.
Below is my custom function that fills in the missing values:
linear_interpolation <- function(data) {
na_indexes <- c()
no_na_indexes <- c()
no_na_values <- c()
for (i in 1:length(data)) {
if (is.na(data[i])) {
na_indexes <- c(na_indexes, i)
} else {
no_na_indexes <- c(no_na_indexes, i)
no_na_values <- c(no_na_values, data[i])
}
}
interpolated_data <- approx(
x = no_na_indexes,
y = no_na_values,
xout = na_indexes,
rule = 2
)
response <- data
response[na_indexes] <- interpolated_data$y
return(response)
}
I perform the interpolation separately for each column:
data_alior_int <- linear_interpolation(data_alior)
data_ing_int <- linear_interpolation(data_ing)
Saving the interpolated data
I save the interpolated data into a CSV file so that it can be used
for further analysis:
new_data <- data.frame(
date = data_dates,
closes_alior = data_alior_int,
closes_ing = data_ing_int
)
write.csv(
x = new_data,
file = "int_data.csv",
row.names = FALSE
)
new_data
Summary
Thanks to this step, the dataset is now complete, and the missing
values have been filled using linear interpolation.
This ensures that the data is ready for further analysis of the closing
price distributions.
LS0tCnRpdGxlOiAiQ2hhcHRlciAyOiBEYXRhIFByZXBhcmF0aW9uIgpvdXRwdXQ6IGh0bWxfbm90ZWJvb2sKLS0tCgojIyBEYXRhIExvYWRpbmcKClVzaW5nICoqUG93ZXJRdWVyeSoqLCBJIGltcG9ydGVkIHRoZSBkYXRhIGZyb20gYSBDU1YgZmlsZS4gQWZ0ZXIgYW5hbHl6aW5nIHRoZSBjb2x1bW4gZGVzY3JpcHRpb25zLCBJIGNvbmZpcm1lZCB0aGF0IG5vbmUgb2YgdGhlbSBjb250YWluZWQgZXJyb3JzIG9yIG1pc3NpbmcgdmFsdWVzLiBIb3dldmVyLCBJIHF1aWNrbHkgbm90aWNlZCB0aGF0IHRoZSBkYXRhc2V0IGluY2x1ZGVzICoqbWlzc2luZyBkYXlzKiog4oCUIHNvbWUgZGF0ZXMgYXJlIGFic2VudCBmcm9tIHRoZSBkYXRhLiBJbnRlcmVzdGluZ2x5LCBpbiBib3RoIGRhdGFzZXRzIChgYWxyX2RgLCBgaW5nX2RgKSwgdGhlIG1pc3NpbmcgZGF5cyBhcmUgZXhhY3RseSB0aGUgc2FtZS4KCiFbKkFsaW9yIEJhbmsgY29tcGFueSBkYXRhKl0oLi4vc2NyZWVuc2hvdHMvMDFfYWxyLnBuZykgIAoKCiFbKklORyBCYW5rIGNvbXBhbnkgZGF0YSpdKC4uL3NjcmVlbnNob3RzLzAyX2luZy5wbmcpCgojIyBNaXNzaW5nIERhdGEgQW5hbHlzaXMKCkkgY2hlY2tlZCB3aGF0IHBlcmNlbnRhZ2Ugb2YgdGhlIGVudGlyZSBkYXRhc2V0IGlzIG1hZGUgdXAgb2YgbWlzc2luZyBkYXlzLiBJZiBpdCB3ZXJlIGxlc3MgdGhhbiA1JSwgdGhleSBjb3VsZCBoYXZlIGJlZW4gb21pdHRlZC4gSG93ZXZlciwgaW4gb3VyIGNhc2UsIHRoZSBtaXNzaW5nIGRhdGEgYWNjb3VudHMgZm9yIGFzIG11Y2ggYXMgKiozMiUqKiwgc28gaWdub3JpbmcgdGhlbSBjb3VsZCBkaXN0b3J0IHRoZSBlbnRpcmUgZGlzdHJpYnV0aW9uIGFuYWx5c2lzLgoKIVsqR2VuZXJhdGVkIHNldCBvZiBhbGwgZGF0ZXMgaW4gdGhlIGFuYWx5emVkIHBlcmlvZCpdKC4uL3NjcmVlbnNob3RzLzA0X2ltcG9ydGVkX2RhdGVzLnBuZykgIAoKCiFbKkZpbmFsIGRhdGFzZXQgaW4gRXhjZWwqXSguLi9zY3JlZW5zaG90cy8wNV9yZWFkeV9kYXRhX2V4Y2VsLnBuZykKCiMjIERhdGEgUHJlcGFyYXRpb24KClRoZSB0YWJsZSB3aXRoIHRoZSByZXF1aXJlZCBkYXRhIHdhcyBzYXZlZCB0byBhIENTViBmaWxlLiBUaGVuLCBpbiAqKlZTQ29kZSoqLCBJIHJlcGxhY2VkIGNvbW1hcyBgLGAgd2l0aCBkb3RzIGAuYCAoYDA2X2NoYW5nZV9jb21tYV90b19kb3RgKSwgYmVjYXVzZSBFeGNlbCBpbiBteSByZWdpb24gc2F2ZXMgbnVtYmVycyB3aXRoIGEgY29tbWEgYXMgdGhlIGRlY2ltYWwgc2VwYXJhdG9yLiAgCgohWypTZXBhcmF0b3IgcmVwbGFjZW1lbnQqXSguLi9zY3JlZW5zaG90cy8wNl9jaGFuZ2VfY29tbWFfdG9fZG90LnBuZykKCgojIyBMaW5lYXIgSW50ZXJwb2xhdGlvbiBvZiBNaXNzaW5nIFZhbHVlcwoKVGhlIG5leHQgc3RlcCBpcyB0byBwZXJmb3JtICoqbGluZWFyIGludGVycG9sYXRpb24qKiBvZiB0aGUgbWlzc2luZyBkYXRhLiAgCgojIyMgRGVmaW5pdGlvbiBvZiBMaW5lYXIgSW50ZXJwb2xhdGlvbgoKTGluZWFyIGludGVycG9sYXRpb24gaXMgYSBtZXRob2QgdG8gZXN0aW1hdGUgbWlzc2luZyB2YWx1ZXMgYnkgYXNzdW1pbmcgYSBsaW5lYXIgY2hhbmdlIGJldHdlZW4gdHdvIGtub3duIHBvaW50cy4gIAoKTWF0aGVtYXRpY2FsbHksIGlmIGEgdmFsdWUgXCh5XCkgYXQgcG9zaXRpb24gXCh4XCkgaXMgbWlzc2luZyBiZXR3ZWVuIHR3byBrbm93biBwb2ludHMgXCgoeF8wLCB5XzApXCkgYW5kIFwoKHhfMSwgeV8xKVwpLCBpdCBpcyBjb21wdXRlZCBhczoKClxbCnkgPSB5XzAgKyBcZnJhY3t5XzEgLSB5XzB9e3hfMSAtIHhfMH0gXGNkb3QgKHggLSB4XzApClxdCgojIyMjIEV4YW1wbGUKCkNvbnNpZGVyIHRoZSBkYXRhc2V0OgoKXFsKeCA9ICgzLCA3LCBcdGV4dHtOQX0sIDUsIDgsIFx0ZXh0e05BfSwgXHRleHR7TkF9LCAxMSkKXF0KCjEuIFRoZSBmaXJzdCBtaXNzaW5nIHZhbHVlIChgTkFgKSBpcyBiZXR3ZWVuIDcgKHBvc2l0aW9uIDIpIGFuZCA1IChwb3NpdGlvbiA0KS4gVXNpbmcgdGhlIGZvcm11bGE6CgpcWwp5ID0gNyArIFxmcmFjezUgLSA3fXs0IC0gMn0gXGNkb3QgKDMgLSAyKSA9IDcgKyAoLTEpID0gNgpcXQoKMi4gVGhlIHNlY29uZCBtaXNzaW5nIHZhbHVlIChgTkFgKSBpcyBiZXR3ZWVuIDggKHBvc2l0aW9uIDUpIGFuZCAxMSAocG9zaXRpb24gOCkuIFNpbmNlIHRoZXJlIGFyZSAqKnR3byBjb25zZWN1dGl2ZSBOQXMqKiwgd2UgaW50ZXJwb2xhdGUgdGhlbSBsaW5lYXJseToKCi0gUG9zaXRpb24gNjogCgpcWwp5ID0gOCArIFxmcmFjezExIC0gOH17OCAtIDV9IFxjZG90ICg2IC0gNSkgPSA4ICsgMSA9IDkKXF0KCi0gUG9zaXRpb24gNzogCgpcWwp5ID0gOCArIFxmcmFjezExIC0gOH17OCAtIDV9IFxjZG90ICg3IC0gNSkgPSA4ICsgMiA9IDEwClxdCgpBZnRlciBsaW5lYXIgaW50ZXJwb2xhdGlvbiwgdGhlIGNvbXBsZXRlZCBkYXRhc2V0IGJlY29tZXM6CgpcWwp4ID0gKDMsIDcsIDYsIDUsIDgsIDksIDEwLCAxMSkKXF0KCiMjIyBMb2FkaW5nIERhdGEKCkkgbG9hZCB0aGUgZGF0YSBpbnRvIFIgYW5kIHNwbGl0IGl0IGludG8gdGhlIGFwcHJvcHJpYXRlIGNvbHVtbnM6CgpgYGB7cn0KZGF0YSA8LSByZWFkLmNzdigKICBmaWxlID0gImRhdGEuY3N2IiwKICBoZWFkZXIgPSBUUlVFLAogIHNlcCA9ICI7IgopCmBgYAoKYGBge3J9CmRhdGEKYGBgCgpgYGB7cn0KZGF0YV9kYXRlcyA8LSBkYXRhJGRhdGUKZGF0YV9hbGlvciA8LSBkYXRhJGNsb3NlX2FscgpkYXRhX2luZyA8LSBkYXRhJGNsb3NlX2luZwpgYGAKCiMjIyBVc2luZyBCdWlsZC1JbiBGdW5jdGlvbnMKCkkgdXNlIHRoZSBgYXBwcm94YCBmdW5jdGlvbiBpbiBSIGZvciBsaW5lYXIgaW50ZXJwb2xhdGlvbi4KClRoZSBgYXBwcm94YCBmdW5jdGlvbiB3b3JrcyBhcyBmb2xsb3dzOgoKLSBJdCB0YWtlcyBhIHZlY3RvciBvZiBrbm93biBwb2ludHM6CiAgLSBgeGA6IHJlcHJlc2VudGluZyBpbmRleGVzIG9mIG5vbi1lbXB0eSB2YWx1ZXMsCiAgLSBgeWA6IHJlcHJlc2VudGluZyB2YWx1ZXMgYXQgaW5kZXhlcyBhdCBgeGAuCgotIEl0IGNvbXB1dGVzIHZhbHVlcyBhdCB0YXJnZXQgcG9pbnRzIChgeG91dGApIHRocm91Z2ggKipsaW5lYXIgaW50ZXJwb2xhdGlvbioqLCBpLmUuLCBpdCBkZXRlcm1pbmVzIGEgcG9pbnQgb24gdGhlIGxpbmUgY29ubmVjdGluZyB0d28gbmVpZ2hib3Jpbmcga25vd24gcG9pbnRzLiAgCgotIFRoZSBwYXJhbWV0ZXIgYHJ1bGUgPSAyYCBlbnN1cmVzIHRoYXQgdmFsdWVzIG91dHNpZGUgdGhlIHJhbmdlIG9mIGtub3duIHBvaW50cyBhcmUgKipleHRyYXBvbGF0ZWQqKiBpbnN0ZWFkIG9mIHJldHVybmluZyBOQS4gIAoKQmVsb3cgaXMgbXkgY3VzdG9tIGZ1bmN0aW9uIHRoYXQgZmlsbHMgaW4gdGhlIG1pc3NpbmcgdmFsdWVzOgoKCmBgYHtyfQpsaW5lYXJfaW50ZXJwb2xhdGlvbiA8LSBmdW5jdGlvbihkYXRhKSB7CiAgbmFfaW5kZXhlcyA8LSBjKCkKICBub19uYV9pbmRleGVzIDwtIGMoKQogIG5vX25hX3ZhbHVlcyA8LSBjKCkKICBmb3IgKGkgaW4gMTpsZW5ndGgoZGF0YSkpIHsKICAgIGlmIChpcy5uYShkYXRhW2ldKSkgewogICAgICBuYV9pbmRleGVzIDwtIGMobmFfaW5kZXhlcywgaSkKICAgIH0gZWxzZSB7CiAgICAgIG5vX25hX2luZGV4ZXMgPC0gYyhub19uYV9pbmRleGVzLCBpKQogICAgICBub19uYV92YWx1ZXMgPC0gYyhub19uYV92YWx1ZXMsIGRhdGFbaV0pCiAgICB9CiAgfQogIAogIGludGVycG9sYXRlZF9kYXRhIDwtIGFwcHJveCgKICAgIHggPSBub19uYV9pbmRleGVzLAogICAgeSA9IG5vX25hX3ZhbHVlcywKICAgIHhvdXQgPSBuYV9pbmRleGVzLAogICAgcnVsZSA9IDIKICApCiAgCiAgcmVzcG9uc2UgPC0gZGF0YQogIHJlc3BvbnNlW25hX2luZGV4ZXNdIDwtIGludGVycG9sYXRlZF9kYXRhJHkKICAKICByZXR1cm4ocmVzcG9uc2UpCn0KYGBgCgpJIHBlcmZvcm0gdGhlIGludGVycG9sYXRpb24gc2VwYXJhdGVseSBmb3IgZWFjaCBjb2x1bW46CgpgYGB7cn0KZGF0YV9hbGlvcl9pbnQgPC0gbGluZWFyX2ludGVycG9sYXRpb24oZGF0YV9hbGlvcikKZGF0YV9pbmdfaW50IDwtIGxpbmVhcl9pbnRlcnBvbGF0aW9uKGRhdGFfaW5nKQpgYGAKCiMjIyBTYXZpbmcgdGhlIGludGVycG9sYXRlZCBkYXRhCgpJIHNhdmUgdGhlIGludGVycG9sYXRlZCBkYXRhIGludG8gYSBDU1YgZmlsZSBzbyB0aGF0IGl0IGNhbiBiZSB1c2VkIGZvciBmdXJ0aGVyIGFuYWx5c2lzOgoKYGBge3J9Cm5ld19kYXRhIDwtIGRhdGEuZnJhbWUoCiAgZGF0ZSA9IGRhdGFfZGF0ZXMsCiAgY2xvc2VzX2FsaW9yID0gZGF0YV9hbGlvcl9pbnQsCiAgY2xvc2VzX2luZyA9IGRhdGFfaW5nX2ludAopCgp3cml0ZS5jc3YoCiAgeCA9IG5ld19kYXRhLAogIGZpbGUgPSAiaW50X2RhdGEuY3N2IiwKICByb3cubmFtZXMgPSBGQUxTRQopCmBgYAoKYGBge3J9Cm5ld19kYXRhCmBgYAoKIyMgU3VtbWFyeQoKVGhhbmtzIHRvIHRoaXMgc3RlcCwgdGhlIGRhdGFzZXQgaXMgbm93IGNvbXBsZXRlLCBhbmQgdGhlIG1pc3NpbmcgdmFsdWVzIGhhdmUgYmVlbiBmaWxsZWQgdXNpbmcgbGluZWFyIGludGVycG9sYXRpb24uICAKVGhpcyBlbnN1cmVzIHRoYXQgdGhlIGRhdGEgaXMgcmVhZHkgZm9yIGZ1cnRoZXIgYW5hbHlzaXMgb2YgdGhlIGNsb3NpbmcgcHJpY2UgZGlzdHJpYnV0aW9ucy4K